Introduction


NHL goalies are kind of an enigma. They’re notoriously hard to evaluate and predict in a sport that is already so chaotic. They are also incredibly important to their teams and often you will hear hockey analysts citing teams’ goalies as the reason that the team is preforming well or poorly.

We are interested specifically in evaluating NHL goalies for the purpose of understanding their contracts. How do teams decide how much to pay their goalies? We want to inspect goalies in the post-lockout seasons and take a look at the best, the worst and how much they get paid.

Objective


The objective of this report is to evaluate NHL goalies and their contracts in the post-lockout era and then build a model to try and predict how much goalies will get paid on their next contract.

NHL Goalies


We wanted to specifically preform this analysis during the post-lockout era (2013-present). Hockey has changed so much in these seasons, comparing goalies from post-lockout era hockey to pre-lockout doesn’t quite make sense.

Overview of Goalies

How many unique goalies played per season?

hr_data %>%
  distinct(player, szn) %>%
  count(szn, name = 'Number of goalies') %>%
  rename(Season = szn) %>%
  kable() %>%
  kable_styling(bootstrap_options = c('hover'), full_width = F)
Season Number of goalies
12-13 82
13-14 97
14-15 92
15-16 92
16-17 95
17-18 95
18-19 93

We see that ~95 goalies played each season. This may seem like a lot but this actually makes sense. From 2013-2017 there were 30 NHL teams in the league and from 2017-2019 there were 31. Each NHL team usually has a starting goalie, a back-up goalie, and a third-string “emergency” goalie. Teams calling-up goalies from their farm teams in emergency situations is also common. So you would (assuming that injuries are uniform) that number of teams in league * 3 is the approximate number of goalies who play each season.

In total there were 187 goalies that played at least 1 game in the NHL from 2013-2019. However, it makes no sense to evaluate a goalie based on their performance in just a few games because of the variation between events occuring in each hockey game. One of the many mysteries surrounding goalies is after how many games can we draw definitive conclusions about their quality of play? Let’s look at how many games goalies were playing per season and over their whole careers.

gp_szn <-
  ggplot(hr_data, aes(gp)) +
  geom_histogram(binwidth = 5, fill = '#a7d2cb', alpha = 0.8) +
  labs(x = 'Games played per season by goalies', y = 'Frequency') +
  theme(plot.caption = element_text(color = 'grey50'))

gp_overall <-
  hr_data %>%
  group_by(player) %>%
  summarise(gp = sum(gp)) %>%
  ggplot(aes(gp)) +
  geom_histogram(binwidth = 25, fill = 'thistle', alpha = 0.8) +
  labs(x = 'Games played overall by goalies', y = 'Frequency',
       caption = '1 Game played = 1 or more minute played on ice') +
  theme(plot.caption = element_text(color = 'grey50'))

gp_szn + gp_overall

In the case of this analysis, we were interested specifically in goalies who had played 40 or more games in the NHL from 2013-2019. Our reasoning behind this is that a full NHL season is 82 games so 40 games is about half a “season” played and seems like a reasonably large enough sample to evaluate a goalie’s play. The one downside to this evaluation is that it will devalue the contributions of back-up goalies, but back-up goalies usually don’t last as long as starters anyway and thus, their contracts tend to stay around league minimum.

Fact: In 2018-19, between goalies getting constantly blown out and injured, the Philadelphia Flyers were infamous for their goaltending trouble. They iced 8 goalies during the regular season, the highest of any team post-lockout.

read_csv('data/flyers_goaliedata_19.csv') %>%
  clean_names() %>%
  arrange(desc(gp)) %>%
  select(Player = player, Team = tm, `Games played` = gp, `Save %` = sv_percent) %>%
  kable() %>%
  kable_styling(bootstrap_options = c('condensed', 'hover'), full_width = F)
Player Team Games played Save %
Carter Hart PHI 31 0.917
Brian Elliott PHI 26 0.907
Anthony Stolarz PHI 12 0.902
Calvin Pickard PHI 11 0.863
Michal Neuvirth PHI 7 0.859
Cam Talbot PHI 4 0.881
Alex Lyon PHI 2 0.806
Mike McKenna PHI 1 0.833

Goalie demographics

It is well-known that most hockey players origniate from North America, Russia, or Scandinavia. Is this true for goalies specifically?

goalie_countries <- 
  cf_data %>%
    count(country, name = "num_of_goalies") %>%
    na.omit()

world <- 
  ne_countries(
    scale = "medium", 
    returnclass = "sf", 
    continent = c('North America', 'Europe')
  )

world %>%
  left_join(goalie_countries, by = c('name' = 'country')) %>%
  ggplot() +
  geom_sf(aes(fill = num_of_goalies)) +
  labs(fill = "Number of goalies") + 
  scale_fill_fish(option = "Prionace_glauca", direction = -1) +
  theme_void() +
  theme(legend.position = "bottom",
        legend.key.height = unit(2, 'mm'),
        legend.text = element_text(size = 7),
        legend.title = element_text(size = 8))

We can see that this holds true for goalies for the most part, but it’s interesting to note that there are goalies who originate from Denmark and even the UK.

The old hockey idea is that goalies are supposed to be big. The taller and wider they are, the more space they take up in the net. However, with the evolution of hockey skill came the need for more athletic goalies with lightning reflexes. The butterfly style, now commonplace in the NHL, requires goalies to be flexible. So what do goalies look like in the NHL now?

cf_data_player_height_weight <- 
  cf_data %>%
  select(player, weight, height, age, country) %>%
  mutate(
    weight = parse_number(sub(".*-","", weight)), 
    height = parse_number(sub(".*-","",height))
    ) %>%
  mutate(age = case_when(
    age < 20 ~ 'Under 20',
    age >= 20 & age < 25 ~ '20-24',
    age >= 25 & age < 30 ~ '25-29',
    age >= 30 & age < 35 ~ '30-34',
    age >= 35 & age < 40 ~ '35-39',
    age > 40 ~ 'Over 40',
  ))

p <-
  cf_data_player_height_weight %>%
  ggplot(aes(height, weight)) +
  geom_jitter(aes(color = age, label = player, label2 = country), 
              alpha = 0.65, size = 3) +
  labs(x = 'Goalies\'s Height (cm)', 
       y = 'Goalies\'s Weight (kg)', 
       color = 'Age') +
  scale_color_fish_d(option = 'Callanthias_australis')

ggplotly(p)

So we can see that the largest chunck of goalies seems to fall between 183 - 193cm (6’0 - 6’3 ft) in height and 85 - 95kg (187 - 210lbs) in weight. So only slightly taller and heavier than the average adult male, which is what you would expect from a professional athelete but is not quite what you would expect if you tend to think of goalies as big players.

The Best (and Worst) of the NHL

The top 10 averages by goalie

hr_data <- hr_data %>% mutate(w_percent = w/gp)

Salaries <- cf_data %>% 
  group_by(player) %>% 
  summarise(mean_aav = mean(aav)) %>% 
  mutate(player = replace(player,player == "Marc-André Fleury","Marc-Andre Fleury")) %>%
  mutate(player = replace(player,player == "Jaroslav Halák","Jaroslav Halak")) %>% 
  mutate(player = replace(player, player == "Eddie Läck","Eddie Lack")) %>% 
  mutate(player = replace(player, player == "Jacob Markström","Jacob Markstrom")) %>%
  mutate(player = replace(player, player == "Petr Mrázek","Petr Mrazek"))

Best_Goalies <- 
  hr_data %>% 
  filter(!(player == "Martin Jones" & szn == "14-15")) %>%
  group_by(player) %>% 
  summarise(gp = sum(gp),
            w = sum(w),
            ga = sum(ga),
            sa = sum(sa),
            sv = sum(sv),
            ) %>%
  filter(gp > 120) %>% 
  mutate(mean_w_percent = w/gp, 
         mean_sv_percent = sv/sa,
         mean_gaa = ga/gp,
         avg_sv = mean(mean_sv_percent),
         mean_gsaa = (sa * (1-avg_sv))- ga,
         )

Best_Goalies <- Best_Goalies %>% 
  left_join(Salaries, by = c("player" = "player"))

Mean_sv_plot <- 
  Best_Goalies %>% 
  top_n(10, mean_sv_percent) %>%
  ggplot(aes(x = mean_sv_percent, y = mean_aav)) + 
  geom_point(size = 2, color = '#a7d2cb') +
  geom_text_repel(aes(label = player), size = 3) +
  labs(x = "Save Percentage", y = "Average Annual Salary")

Mean_gaa_plot <- 
  Best_Goalies %>% 
  top_n(-10, mean_gaa) %>%
  ggplot(aes(x = mean_gaa, y = mean_aav)) + 
  geom_point(size = 2, color = '#f2d388') +
  geom_text_repel(aes(label = player), size = 3) +
  labs(x = "Goals Against Average", y = "Average Annual Salary")

mean_w_percent_plot <- 
  Best_Goalies %>% 
  top_n(10, mean_w_percent) %>% 
  ggplot(aes(x = mean_w_percent, y = mean_aav)) + 
  geom_point(size = 2, color = '#c98474') + 
  geom_text_repel(aes(label = player), size = 3) +
  labs(x = "Percentage of Games Won", y = "Average Annual Salary")

mean_gsaa_plot <- 
  Best_Goalies %>% 
  top_n(10, mean_gsaa) %>% 
  ggplot(aes(x = mean_gsaa,y = mean_aav))+ 
  geom_point(size = 2, color = '#874c62') + 
  geom_text_repel(aes(label = player), size = 3) +
  labs(x = "Goals Saved Above Average", y = "Average Annual Salary")

(Mean_sv_plot | Mean_gaa_plot) / (mean_w_percent_plot | mean_gsaa_plot)

Top goalies by season

# Finding the best goalies by Season
# creates tables for each stat we are looking at

top_goalie_sv <- function(season){
  top_goalie_stats <- hr_data %>% filter(szn == season, gp > 40) 
  top_goalie_sv_percent <- top_goalie_stats %>% top_n(1,sv_percent) %>%
    select(player, sv_percent, szn)
  return(top_goalie_sv_percent)
}

top_goalie_gaa <- function(season){
  top_goalie_stats <- hr_data %>% 
    filter(szn == season, gp > 40)
  goalie_gaa <- top_goalie_stats %>% 
    top_n(-1,gaa) %>% 
    select(player, gaa, szn)
  return(goalie_gaa)
}


top_goalie_w_percent <- function(season){
  top_goalie_stats <- hr_data %>% filter(szn == season, gp > 40)
  goalie_w_percent <- top_goalie_stats %>% top_n(1,w_percent) %>% 
    select(player, w_percent, szn)
  return(goalie_w_percent)
}


top_goalie_gsaa <- function(season){
  top_goalie_stats <- hr_data %>% filter(szn == season, gp > 40)
  goalie_gsaa <- top_goalie_stats %>% top_n(1,gsaa) %>% 
    select(player, gsaa, szn)
  return(goalie_gsaa)
}

#Loop to fill in the tables

best_sv_year <- tibble()
best_gaa_year <- tibble()
best_win_percent_year <- tibble()
best_gsaa_year <- tibble()
for (i in 1:7){
  best_sv_year <- bind_rows(best_sv_year,top_goalie_sv(hr_szns[i]))
  best_gaa_year <- bind_rows(best_gaa_year,top_goalie_gaa(hr_szns[i]))
  best_win_percent_year <- bind_rows(best_win_percent_year,top_goalie_w_percent(hr_szns[i]))
  best_gsaa_year <- bind_rows(best_gsaa_year,top_goalie_gsaa(hr_szns[i]))
}



#Plots to see best goalies by season 


sv_plot <- 
  best_sv_year %>% 
  ggplot(aes(x = szn, y = sv_percent)) + 
  geom_point(size = 2, color = '#a7d2cb') +
  geom_text_repel(aes(label = player), size = 3) +
  coord_flip() + 
  labs(x = "Season", y = "Save Percentage")

gaa_plot <- 
  best_gaa_year %>% 
  ggplot(aes(x =szn, y = gaa)) + 
  geom_point(size = 2, color = '#f2d388') +
  geom_text_repel(aes(label = player), size = 3) +
  coord_flip() + 
  labs(x = "Season", y = "Goals Against Average")

w_percent_plot <- 
  best_win_percent_year %>% 
  ggplot(aes(x = szn, y = w_percent)) + 
  geom_point(size = 2, color = '#c98474') + 
  geom_text_repel(aes(label = player), size = 3) +
  coord_flip() + 
  labs(x = "Season", y = "Win Percentage")

gsaa_plot <- 
  best_gsaa_year %>% 
  ggplot(aes(x = szn, y = gsaa)) + 
  geom_point(size = 2, color = '#874c62') + 
  geom_text_repel(aes(label = player), size = 3) +
  coord_flip() + 
  labs(x = "Season", y = "Goals Saved Above Average")

year_plots <- (sv_plot | gaa_plot) / (w_percent_plot | gsaa_plot)

year_plots 

# + plot_annotation(
#   title = "Top Goalies by Season"
# )

Worst Goalies over past seasons

#Worst Goalies by averages overall

worst_mean_sv_plot <- Best_Goalies %>% 
  top_n(-10,mean_sv_percent) %>%
  ggplot(aes(x = mean_sv_percent, y = mean_aav)) + 
  geom_point(size = 2, color = '#a7d2cb') +
  geom_text_repel(aes(label = player), size = 3) +
  labs(x = "Save Percentage", y = "Average Annual Salary")

worst_mean_gaa_plot <- Best_Goalies %>% 
  top_n(10,mean_gaa) %>%
  ggplot(aes(x = mean_gaa, y = mean_aav)) + 
  geom_point(size = 2, color = '#f2d388') +
  geom_text_repel(aes(label = player), size = 3) +
  labs(x = "Goals Against Average", y = "Average Annual Salary")

worst_mean_w_percent_plot <- Best_Goalies %>% 
  top_n(-10, mean_w_percent) %>% 
  ggplot(aes(x = mean_w_percent, y = mean_aav)) + 
  geom_point(size = 2, color = '#c98474') + 
  geom_text_repel(aes(label = player), size = 3) +
  labs(x = "Percentage of Games Won", y = "Average Annual Salary")


worst_mean_gsaa_plot <- Best_Goalies %>% 
  top_n(-10, mean_gsaa) %>% 
  ggplot(aes(x = mean_gsaa,y = mean_aav))+ 
  geom_point(size = 2, color = '#874c62') + 
  geom_text_repel(aes(label = player), size = 3) +
  labs(x = "Goals Saved Above Average", y = "Average Annual Salary")


Worst_average_plots <- (worst_mean_sv_plot | worst_mean_gaa_plot) / (worst_mean_w_percent_plot | worst_mean_gsaa_plot)

Worst_average_plots 

Worst goalies over last 6 seasons

# Some really bad goalies by year 

missing_salary <- 
  read_xls("data/MH_nhl_goalies_2017-2018.xls") %>% 
  clean_names() %>% 
  unite('player', c('first_name', 'last_name'), sep = ' ') %>% 
  filter(player == "Scott Darling") %>% 
  select(player, salary)

worst_goalies <- hr_data %>%
  filter(gp > 40 & sv_percent < 0.9)

worst_salaries <- cf_data %>% select(player, aav, szn)

worst_goalies <- worst_goalies %>% 
  left_join(worst_salaries,by = c("player" = "player", "szn" = "szn")) %>%
  left_join(missing_salary, by = c("player" = "player")) %>% 
  mutate(aav = replace_na(aav,0)) %>% 
  mutate(salary = replace_na(salary,0)) %>% 
  mutate(aav = aav+ salary) %>% 
  unite('player_szn', c('player','szn'), sep = ' | Season:', remove = FALSE) %>%
  select(-salary) %>%
  mutate(w_percent = w/gp)

wg_sv_percent <- worst_goalies %>% 
  ggplot(aes(x = sv_percent, y = aav)) + 
  geom_point(size = 2, color = '#a7d2cb') +
  geom_text_repel(aes(label = player), size = 3) +
  labs(x = "Save Percentage", y = "Annual Average Salary")

wg_gsaa <- worst_goalies %>% ggplot(aes(x = gsaa, y = aav)) + 
  geom_point(size = 2, color = '#f2d388') +
  geom_text_repel(aes(label = player), size = 3) +
  labs(x = "Goals Saved Above Average", y = "Annual Average Salary")

wg_gaa <- worst_goalies %>% ggplot(aes(x = gaa, y = aav)) + 
  geom_point(size = 2, color = '#c98474') + 
  geom_text_repel(aes(label = player), size = 3) +
  labs(x = "Goals Against Average", y = "Annual Average Salary")

wg_w_percent <- worst_goalies %>% 
  ggplot(aes(x = w_percent, y = aav)) + 
  geom_point(size = 2, color = '#874c62') + 
  geom_text_repel(aes(label = player), size = 3) +
  labs(x = "Win Percentage", y = "Annual Average Salary")

Worst_goalies_plot <- (wg_sv_percent | wg_gaa) / (wg_w_percent | wg_gsaa)

Worst_goalies_plot 

# + plot_annotation(
#   title = "Worst perfoming goalies and how much they get paid"
# )

First place team goalie performance

Red point is league average.

league_avgs <-
  hr_data %>%
  filter(gp > 20) %>%
  group_by(szn) %>%
  mutate(w_percent = w/gp) %>%
  summarise(
    avg_sv_percent = mean(sv_percent),
    avg_gaa = mean(gaa),
    avg_w_percent = mean(w_percent),
    avg_gsaa = mean(gsaa)
  )

starting_goalies_best <- 
  standings_data %>%
    filter(rk == 1) %>%
    select(-w, -l, -ol, -pts) %>%
    left_join(read_csv('data/NHLTeams.csv')) %>%
    left_join(hr_data, by = c('abbrev' = 'team', 'szn' = 'szn')) %>%
    group_by(team, szn) %>%
    top_n(1, gp) %>%
    mutate(w_percent = w/gp)

starting_goalies_sv_percent <- 
  ggplot() +
    geom_point(data = starting_goalies_best, 
               aes(x = szn, y = sv_percent, color = team), 
               show.legend = F, size = 4) +
    geom_point(data = league_avgs,
               aes(x = szn, y = avg_sv_percent),
               show.legend = F, size = 3, color = 'red3') +
    geom_text_repel(data = starting_goalies_best, 
                    aes(x = szn, y = sv_percent, label = player), 
                    size = 3) +
    labs(y = 'Save percent', x = 'Season') +
    ggtitle('Save percent') +
    scale_color_fish_d(option = "Callanthias_australis") +
    coord_flip() +
    theme_hc()

starting_goalies_gaa <- 
  ggplot() +
    geom_point(data = starting_goalies_best, 
               aes(x = szn, y = gaa, color = team), 
               show.legend = F, size = 4) +
    geom_point(data = league_avgs,
               aes(x = szn, y = avg_gaa),
               show.legend = F, size = 3, color = 'red3') +
    geom_text_repel(data = starting_goalies_best, 
                    aes(x = szn, y = gaa, label = player), 
                    size = 3) +
    labs(y = 'Goals against avg', x = NULL) +
    ggtitle('Goals against average') +
    scale_color_fish_d(option = "Callanthias_australis") +
    coord_flip()

starting_goalies_comb <- starting_goalies_sv_percent / starting_goalies_gaa
starting_goalies_comb

Goalies Contracts


  • show how much goalies get paid
  • how has that changed over time? (line plot)
  • Do this depend on how good their team is?

How Contracts Affect Pay


  • How good are goalies that got paid a lot? (vice versa)

How Much Will Goalies Get Paid


  • how many goalies are we looking at?
  • show where the goalies are from (it could be a single plot since most of the goalies will be the same over the years), what do they look like? (height & weight)
    • plot showing where they are from
    • plot showing height/weight
  • who are the best ones?
    • plot of their avg szn

Conclusions


  • how many goalies are we looking at?
  • show where the goalies are from (it could be a single plot since most of the goalies will be the same over the years), what do they look like? (height & weight)
    • plot showing where they are from
    • plot showing height/weight
  • who are the best ones?
    • plot of their avg szn

Data


  • how many goalies are we looking at?
  • show where the goalies are from (it could be a single plot since most of the goalies will be the same over the years), what do they look like? (height & weight)
    • plot showing where they are from
    • plot showing height/weight
  • who are the best ones?
    • plot of their avg szn